Given we are using Xero customer data to tap into broader economic trends, it is important to understand how confident we can be in generalising our results over the target population of small businesses. The closer our measurements are to the true value of the various indicators, the more valuable they are for analytical and decision-making purposes. Unfortunately, the true value is usually unknown, so we can only have an estimate of the accuracy (or the lack thereof). Generally speaking, the following aspects of accuracy need to be evaluated:
- Selectivity. Xero customer data is not a result of random sampling design, but is organically generated. This inherent self-selectivity may be a source of bias. For example, some industries are typically under-/over-represented in the Xero dataset compared to the real population. Any known biases need to be removed (corrected for), if feasible. This current analysis accounts for industry bias.
- Margin of error. We use a sample to estimate the value of an indicator at the population level, therefore we need to understand the likeliness of being reasonably close to the true value of the indicator.
- Precision. Are our indicator measurements reproducible?
The next section displays estimates for SBI indicators. The estimates are bias-corrected (accounting for variability in industry representation). The charts show the mean estimates and 95% confidence intervals for the estimates. (Precision in our case would mostly involve looking into reproducing measurements over time as the dataset evolves, which is likely to follow a similar pattern to established markets.)